SEM notes - Zhiyong Zhang \'s Psychometric Website

Steal from http://www.biostat.umn.edu/~melanie/PH5482/2002/Lecture5/lecture52.html

Web references I've found on the subject of SEM

Papers that have suggestion of how to write up the results of a structural equation model in a paper.

Hoyle and Panter (1995) "Writing about Structural Equation Models" in Structural equation modeling, concepts, issues, applications ed. Hoyle. pp158-176.
Raykov, T., Tomer, A., and Nesselroade, J. (1991) "Reporting structural equation modeling results in Psychology and Aging: Some proposed guidelines" psychology and Aging, Vol6, No. $, 499-503. (Thanks to Muree Larson-Bright for finding this article Nov. 2002)

Structural Equation Modeling (Hybrid Models)

Path analysis with latent variables
A Two Step Approach, (a common reference Anderson, J.C. and Gerbing, D.W. (1988), Psychological Bulletin)
- Measurement model (specifies relationships between the latent variables and their indicators)
- Structural model (specifies relationships between latent variables)
When we perform a test of an SEM, we are simultaneously testing whether the combined measurement and structural model is adequate to explain the structure of the data.

Handout of example from Hatcher. 6 underlying latent variables. First fit the CFA model to the 18 observed variables with 6 factors letting every factor be simply correlated with every other factor. Second step is to fit the hypothesized direct relationships between the latent factors. This is basically putting restrictions on the PHI matrix.

Identifiability

Lots of rules for determining but none of the simple ones are both sufficient and necessary
Just because you have positive degrees of freedom doesn't mean the model is identified
Here are some rules that are sufficient for identifiability when the factors all have ``simple structure'' (i.e. each observed variables loads on only one factor) and the model is recursive
- If each factor has 3 indicators then the model is identified
- If a factor has only two indicators, then that factor must be correlated with some other factor then the model is identified.
- If a factor has only one indicator then you need to do one of the following
  - Assume that it measures the latent variable without error OR
  - Use some estimate of its reliability (perhaps Cronbach alpha) and fix its error variance. That is, if you know the reliability of the scale is .7 then fix its measurement error variance to (1-.7)*(Variance of the scale)
For other models, check out order and rank conditions (Maruyama and Kline discuss these) or just go ahead and try to fit the model and let software tell you whether it is identified or not.

Heywood Cases

When a parameter is estimated to be a value that lies outside of its known range of possibilities (i.e. outside the parameter space).
Most common scenario, the Maximum likelihood estimate for a measurement error variance turns out to NEGATIVE.

Why?
- model is very misspecified
- small sample size
- the true value of the variance is just very close to zero but unfortunately was estimated to be negative (There is a way to check this by fixing the error variance to equal zero and then doing a Chi-square difference test)
Another scenario not as easy to detect is when the estimated covariance matrix of the the latent factors is not positive definite. This is also caused by model misspecification and/or small sample sizes
For both of these, AMOS will say ``this solution is not admissible''.
Occurs more commonly when there are only two indicators per latent variable

GOODNESS OF FIT INDICES. Handout of AMOS manual appendix.

Discuss Problems with Chi-squared test as sample size increases
NFI (Normed Fit Index) - compare your model to the independence model, rule of thumb > .9 is acceptable - 1 - (Chisquare of your model) / (Chisquare of the independence model)
AIC - adds a penalty to the chi-squared test for additional parameters, at some point adding additional parameters is just modeling the noise not the signal, compare AIC across different models, smaller AIC is better.
RMSEA - sqrt( (chisquare - df)/ (N*df) ) - Root mean square error of approximation, rule of thumb < .05 is indicative of close fit and < .08 is reasonable fit, but RMSEA > .1 is bad fit. Check out Browne and Cudeck (1993) "Alternative Ways of Assessing Model Fit" in Testing Structural Equation Models eds. Bollen and Long, pp.136-162.

CROSS VALIDATION.

A good way to establish a model (especially in circumstances where changes to the model are made using the data) is to use cross-validation
When sample size is large enough split the sample in half, determine the best model given one half of the data (the calibration sample) and then examine the fit of that model applied to the second half (the validation sample).
This assesment of the fit of the model based on the calibration sample to the validation sample can be done in several ways.
- For an overview...MacCallum, R. C., Roznowski, M., Mar, M. and Reith, J.V. (1994) Alternative strategies for cross-validation of covariance structure modeling. Multivaraite Behavioral Research, 29, 1-32.
- Cudeck and Browne (1983) "Cross validation of covariance structures" Multivaraite Behavioral Research, 18, 147-167 develop a cross-validation index (CFI) that looks at the discrepency between the estimated model covariance matrix based on the calibration sample as compared to the sample covaraince of the validation sample.
- Cross-validation IN AMOS...Can use treat the calibration and validation samples as two different groups and then test whether equality constraints across the groups significantly effect the model fit. Treating each sample as a separate group fit the model (developed on the calibration sample) to both groups with and without equality constraints across the groups and perform a chi-squared difference test. If there is no significant difference then we cannot reject that the model fits the two groups (samples) equally well.

Alternatives to doing a full SEM.

These methods eliminate the explicit use of latent variables in the modeling observed data where there is some hypothesized underlying latent variable.

Rather than use the CFA model directly in the the SEM alternative methods are to take the results from the CFA and...

Choose one observed variable to represent each factor. Choosing the one with the largest standardized factor loading is common. When there is more than one variable with similar large loading, choose the variable which has better subject knowledge interpretation.
Add up (or average) the indicators of each factor and use this sum to represent the factor.
Create a factor score estimate, the predicted value of the factor given the observations. SEE HANDOUT. AMOS only provides the weights for the sum, other software SAS, SPSS, MPLUS actually take the additional step to calculate the factor score estimate for each observation.
- A very detailed website about factor scores is at http://psychology.okstate.edu/faculty/jgrice/factorscores/

Each of these methods looses information because the surrogate used for each factor will subsequently be treated as if it observed the factor perfectly which we know is not true.

Recent research topics in SEM that I will not cover

Multi-level analysis (nested samples). Incorporating random effects due to clinics or schools into the model. MPLUS can do this.
Nonlinear relationships between factors. SEM is limited to simple linear relations between all variables (because it only analyses covariances)